AMENDMENTS TO CLAIMS 

This listing of claims will replace all prior versions, and 
listings, of claims in the application: 

Listing of Claims: 

1 (Currently Amended) An automatic speech segmentation 
and verification method for segmenting into speech unit 
segments and verifying said speech unit segments by 
determining which phonetic units defined by a known text 
script are to be accepted for output, said phonetic units 
accepted for output being used for speech synthesis , 
comprising : 

a retrieving step, for retrieving a — the recorded 
speech corpus, the recorded speech corpus corresponding 
to a — the known text script, the known text script 
defining phonetic information with N said phonetic units; 
a segmenting step, for segmenting the recorded speech 
corpus into N test speech unit segments referring to the 
phonetic information of the N phonetic units in the known 
text script; 

a segment -confidence-measure verifying step, for 
verifying segment confidence measures of all — N cutting 
points of the N test speech unit segments to determine if 
the N— cutting points of the N_test speech unit segments are 
correct ; 

a phonetic-confidence-measure verifying step, for 
verifying phonetic confidence measures of the test speech 
unit segments to determine if the test speech unit segments 
correspond to the known text script; and 

a determining step, for determining acceptance of the 
phonetic unit by comparing a combination of the segment 
confidence measures reliability and the phonetic confidence 
measures of the test speech unit segments to a 
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predetermined threshold value; wherein if the combined 
confidence measure is greater than the predetermined 
threshold value, the phonetic unit is accepted for output . 

2 . (Original) The method as claimed in claim 1, wherein 
the segmenting step further comprises: 

using a hidden Markov model (HMM) to cut the recorded 
speech corpus into N test speech unit segments referring to 
the phonetic information of the N phonetic units in the 
known text script, wherein each test speech unit segment is 
defined as correspondingly having an initial cutting point; 

performing a fine adjustment on the initial cutting 
point of the test speech unit segment according to at least 
one feature factor corresponding to each test speech unit 
segment and calculating at least one cutting point fine 
adjustment value corresponding to each test speech unit 
segment; and 

integrating the initial cutting point and the cutting 
point fine adjustment value of the test speech unit segment 
to obtain a cutting point of the test speech unit segment. 

3 . (Original) The method as claimed in claim 2, wherein 
the feature factor of the test speech unit segment is a 
neighboring cutting point of the initial cutting point. 

4 . (Original ) The method as claimed in claim 2, wherein 
the feature factor of the test speech unit segment is a 
zero crossing rate (ZCR) of the test speech unit segment. 

5 . (Original) The method as claimed in claim 2, wherein 
the feature factor of the test speech unit segment is an 
energy value of the test speech unit segment. 

6 . (Original) The method as claimed in claim 5, wherein 
the energy value is an energy value of a band pass signal 
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and a high pass signal retrieved from a speaker-dependent 
band . 



7 . (Original) The method as claimed in claim 2, wherein 
each cutting point fine adjustment value has a weighted 
value, and the cutting point of the test speech unit 
segment is a weighted average of the initial cutting point 
and the cutting point fine adjustment value. 

8 . (Original) The method as claimed in claim 1, wherein 
in the segment -confidence -measure step, each segment 
confidence measure of the test speech unit segment is: 



CMS = maxj^l - h(D) - £ g(c(s), f (s)), 0 J , 

where h(D) = w.|dj — d j , D is a vector of multiple 

expert decisions of the cutting point, d- is the cutting 

point, d = p(D) is a final decision of the cutting point, 
K(x) is a monotonically increasing function that maps a 
non-negative variable x into a value between 0 and 1, 
g(c(s), f(s)) is a cost function value between a cost 
function ranging from 0 to 1 , s is a segment, c(s) is a 
type category of the segment s and, f (s) are acoustic 
features of the segment . 

9 . (Original) The method as claimed in claim 1, wherein 
in the phonetic-confidence-measure step, each phonetic 
confidence measure of the test speech unit segments is: 

CMV = min {LLR, , LLR F ,0} , 

f LLRj = logP(X 1 | H 0 ) - logP(X I | H.) 
where < , X T is an 

[LLR F = logP(X F | H 0 ) - logP(X F | H, ) 

initial segment of the test speech unit segment, X F is a 
final segment of the test speech unit segment, H 0 is a null 
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hypothesis of the test speech unit segment recorded 
correctly, H x is an alternative hypothesis of the test 
speech unit segment recorded incorrectly, and LLR is a log 
likelihood ratio. 

10 . (Original) An automatic speech segmentation and 
verification system comprising: 

a database for storing a known text script and a 
recorded speech corpus corresponding to the known text 
script, and the known text script has phonetic information 
with N speech unit segment wherein N is a positive integer; 

a speech unit segmentor for segmenting the recorded 
speech corpus into N test speech unit segments referring to 
the phonetic information of the known text script; 

a segmental verifier for verifying the correctness of 
the cutting points of test speech unit segments by 
obtaining a segmental confidence measure ; 

a phonetic verifier for obtaining a confidence measure 
of syllable verification by using verification models for 
verifying whether the recorded speech corpus is correctly 
recorded; and 

a speech unit inspector for integrating the confidence 
measure of syllable segmentation and the confidence measure 
of syllable verification to determine whether the test 
speech unit segment is accepted. 

11. (Original ) The system as claimed in claim 10, 
wherein the segmental verifier performs the following 
steps : 

using a hidden Markov model (HMM) to cut the recorded 
speech corpus into N test speech unit segments referring to 
the phonetic information of the N phonetic units in the 
known text script, wherein each test speech unit segment is 
defined as correspondingly having an initial cutting point; 
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performing a fine adjustment on the initial cutting 
point of the test speech unit segment according to at least 
one feature factor corresponding to each test speech unit 
segment and calculating at least one cutting point fine 
adjustment value corresponding to each test speech unit 
segment ; and 

integrating the initial cutting point and the cutting 
point fine adjustment value of the test speech unit segment 
to obtain a cutting point of the test speech unit segment. 

12 . (Original ) The system as claimed in claim 11, 
wherein the feature factor of the test speech unit segment 
is a neighboring cutting point of the initial cutting 
point . 

13 . (Original) The system as claimed in claim 11, 
wherein the feature factor of the test speech unit segment 
is a zero crossing rate (ZCR) of the test speech unit 
segment . 

14 . (Original) The system as claimed in claim 11, 
wherein the feature factor of the test speech unit segment 
is an energy value of the test speech unit segment. 

15 . (Original) The system as claimed in claim 14, 
wherein the energy value is an energy value of a band pass 
signal and a high pass signal retrieved from a speaker- 
dependent band. 

16 . (Original ) The system as claimed in claim 11, 
wherein each cutting point fine adjustment value has a 
weighted value, and the cutting point of the test speech 
unit segment is a weighted average of the initial cutting 
point and the cutting point fine adjustment value. 
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17 . (Original) The system as claimed in claim 10, 
wherein each segment confidence measure of the test speech 
unit segment is determined by: 

CMS = max^l - h(D) - £ g(c(s), f (s)), 0 j , 

where h(D) = k(]£ wjdj d ] , D is the vector of multiple 

expert decisions of the cutting point, d { is the cutting 

point, d=p(D) is a final decision of the cutting point, 

K(x) is a monotonically increasing function that maps a 
non-negative variable x into a value between 0 and 1, 
g(c(s), f(s)) is a cost function value between a cost 
function ranging from 0 to 1, s is a segment, c(s) is the 
type category of the segment s and, f (s) is the acoustic 
feature of the segment . 

18 . (Currently amended) The method system as claimed in 
claim 10, wherein each phonetic confidence measure of the 
test speech unit segments is determined by: 

CMV = min{LLR I5 LLR F? 0} , 

fLLR^logPCXJHJ-logPCXJH^ 
where < ,Xi is initial 

[LLR F = logP(X F | H 0 ) - logP(X F | H,) 

segment of the test speech unit segment, X F is final 
segment of the test speech unit segment, H 0 is a null 
hypothesis of the test speech unit segment recorded 
correctly, Hi is an alternative hypothesis of the test 
speech unit segment recorded incorrectly, and LLR is a log 
likelihood ratio. 
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